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Abstract 

If pricing kernels are assumed non-negative then the inverse problem 
of finding the pricing kernel is well-posed. The constrained least squares 
method provides a consistent estimate of the pricing kernel. When the 
data are limited, a new method is suggested: relaxed maximization of the 
relative entropy. This estimator is also consistent. 
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1 Introduction 



Modern finance theory postulates that the price of a security is an integral of 
its future payoff multiplied by a pricing kernel: 

Six,e) = J F{x',e)p{x,x')dx'. (1) 

Here S represents the security price, x and x' current and future values of 
stochastic factors relevant for pricing the security, 9 a non-stochastic parameter, 
F the future payoff, and p the pricing kernel. The pricing kernel is of great 
interest to finance theory because it sheds light on investors' preferences over 
current and delayed consumption. Practitioners are also interested in the pricing 
kernel because it helps in pricing new securities, finding mispriced assets, and 
managing risk^ . Non-surprisingly, when it was discovered that the pricing kernel 
can be recovered from option prices^, financial economists en masse went agog 
inventing new and better methods for estimating the pricing kernel.^ Many 
of the methods, however, are heuristic and lack a rigorous proof of consistency. 
This paper focuses on providing a simple proof of consistency for the constrained 
least squares and a modified maximum entropy methods. 

Mathematically, the pricing kernel estimation is an inverse problem. A linear 
operator maps a set of functions ( "pricing kernels" ) into another set of func- 
tions ("security prices"), and the problem is to invert this operator. Often the 
problem is additionally complicated by the fact that prices are observed only 
for a discrete set of securities and contaminated with noise. This kind of in- 
verse problems frequently appears in diverse areas of applied mathematics and 
thoroughly studied 

The pricing kernel estimation is, however, special and what makes it special 
is that the pricing kernel must be non-negative to prevent the existence of 
systematic arbitrage opportunities. 

This restriction on the operator's domain helps a lot. Without it, the inverse 
problem is ill-posed, that is, the pricing operator does not have a continuous 
inverse. Intuitively, small changes in security prices could lead to large changes 
in the estimate of the pricing kernel. In addition, without this restriction, the 
least squares method of estimation is inconsistent. The pricing kernel selected 
by the least squares would fit the prices exactly but would not converge to 
the true pricing kernel. In contrast, non-negativity of the pricing kernel makes 
the corresponding inverse problem well-posed and the least squares method 
consistent. 

^See, for example, applications in [Jackwerth (2000)] |Ait-Sahalia and Lo (2000) | and 
[Rosenberg and E nglc (2002^] 

" ''By ^Sree ^en and Litzenberger (1978)1 and [Banz and Miller (1978]] and revived by 
[Rubinstein (1994)| ^ 

•^An incomplete list includes 'Jackwerth and Rubinstein (1996)' 'Avellaneda et al. (1997*] 
Soder lind and Swcnsson (1997) Mclick and Thomas (1997) Ait-Sahalia and Lo (1998) 
^wTlancda (19 98T ^Jtawerth^Ml)]] [Ait-Sahalia'and t)uarte (2003)) anc 
bondarenko (2003)1 

*See reviewsln fTikhonov and Arsenin (1977)[|0'Sullivan (1986)[ and [Engl (2000) [ 
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The key to the well-posedness is that for pricing purposes it is enough to es- 
timate the distribution function of the pricing kerneh cumulative pricing kernel. 
These functions form a Banach space with respect to the uniform convergence 
topology, and we can apply one of the Banach theorems: A continuous one-to- 
one operator on a Banach space has a continuous inverse. Consequently, the 
inverse problem is well-posed. 

What can be said about consistency? By well-posedness, estimating the 
kernel can be reduced to estimating price function from noisy observations: the 
map from price functions to pricing kernels is continuous and cannot inflate the 
error of estimation. Luckily, the problem of estimating the price function is the 
classic problem of non-parametric estimation of a regression function, and for 
this problem the conditions of the least squares consistency are well known. It 
turns out that they are satisfied provided the pricing kernel is non-negative. In- 
tuitively, additional information about the structure of pricing kernels prevents 
overfitting of the regression function and forces consistent convergence of the 
estimates. Together with well-posedness, this implies that the constrained least 
squares estimates the cumulative pricing kernel consistently. 

While asymptotically consistent, the constrained least squares may, however, 
perform unsatisfactorily in small samples. It is because this method ignores 
prior information about the pricing kernel. One way to abate the problem is 
to include in the objective function a term that measures distance from the 
prior information pricing kernel. This idea leads to a method that combines 
advantages of both the least squares and the maximal entropy methods. The 
method maximizes the weighted sum of relative entropy and the mean squared 
pricing error. With suitably chosen parameters, this method is also consistent. 

Let me briefly describe the related literature. The maximum entropy method 



for estimating pricing kernel was developed by Buchen and Kelly (1996) and 
Stutzer (1996)[ following a suggestion in Ru binstein (1994)[ and elaborated by 



Avellaneda et al. (1997)| [Avellaneda (1998)1 and |Friteni (2060)| These papers 



typically assume that the securities are priced correctly but only a scarce discrete 
set of prices is known. For the alternative case of large amount of noisy data, 
methods of pricing kernel estimation based on smoothing or other ideas were de- 



veloped by Jackwerth and Rubinstein (1996) Ait-Sahaha and Lo (1998) Jackwerth (2000) 



and Bondarenko (2003) among others. Implicitly, these papers address the prob- 
lem of ill-posedness of kernel estimation by the classic method of regularization. 
This paper is different because it shows that on the restricted domain of non- 
negative kernels the problem is well-posed and so does not need additional 
regularization. 



Ait-Sahalia and Duarte (2003) estimate the pricing kernels by smoothing 
the constrained least squares estimator, and refer to the statistical literature 
for the proof of consistency. In this paper, we provide an explicit proof of the 
constrained least squares consistency and consider another modification of the 
method based on the idea of entropy distance minimization. 

The rest of the paper is organized as follows. Section 2 reminds the basics 
of the theory of pricing kernels. Section 3 shows that the problem of finding the 
non-decreasing cumulative pricing kernel is well-posed. Section 4 demonstrates 
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consistency of the least squares. Section 5 explains how the idea of the maxi- 
mum entropy can be used to improve the least squares method, and proves the 
consistency of the modification. Section 6 concludes. 



2 What is the pricing kernel? 

The pricing kernel, p, is a function of stochastic factors that allows pricing 
securities by using their future payoff functions: 

S{x,e) =^ J F{x',9)p{x,x')dx'. (2) 

Often, the choice of units in which the stochastic factors are measured is arbi- 
trary, so we can normalize the initial level of factors: x = 1. Slightly abusing 
notation, we will denote p{ljx) as p{x). Let us define cumulative pricing kernel 
as follows: 

P{x) = / p{t)dt. (3) 



With these notations, the pricing formula can be rewritten in a more convenient 
form: 

S{e) = J F{x,e)dP{x). (4) 

Non-negativity of the pricing kernel, implied by the absence of arbitrage 
opportunities ( ^Harrison and Kreps (1979) 1, translates into monotonicity of the 



cumulative pricing kernel: P{x) is non-decreasing. In addition, the price of the 
security that have a unit payoff is finite, so P{x) is bounded. 

We will be interested in pricing kernels that depend only on one factor. For 
example, when the class of securities consists of options written on another secu- 
rity, this factor is the price of the underlying security. We can further simplify 
the problem by noting that payoff of most derivative securities that occur in 
practice can be represented as a linear combination of underlying security and 
security that has a non-zero payoff only if the price of underlying is less than a 
certain bound, B. Therefore, we can concentrate on pricing the derivatives with 
finite support, and then by integration by parts we have the following pricing 
formula: ^ 

s{e) = - /" p{x)dF{x,e), (5) 

where B is such that F{x, 9) = for x > B. 

The next lemma shows that estimating cumulative pricing kernel is suffi- 
cient for pricing purposes. Let Pn{x) be an estimate of P{x). Let Sn be the 
corresponding price of the derivative from . 

Lemma 1 Suppose F{x) has bounded variation and P„ converges to P in uni- 
form metric as n goes to oo. Then 5„ converges to S. 
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Proof: 



\Sn-S\ = 



[Pn{x)-P{x)]dF{x) 



< 



< C \\Pn{x) - P{X) 



\P„{x)^P{x)\dF{x) (6) 
(7) 



where C is the total variation of F(x). QED. 

Consider now how we can estimate the pricing kernel. Typically, it is done 
by using the prices of puts. A European put with strike if is a security that 
will pay: 

F{x,K)^msix{K -x,0) (8) 

at the expiration date if the price of the underlying security is x on that date. 
Then, the price of the put with strike K is 



SiK) 



The operator of interest is then 



K 



{K - x) dP{x) 



K 



P{x)dx. 



A : P{x) S{K) ^ I P 
Jo 



{x)dx. 



(9) 
(10) 

(11) 



In a more general setting, we are interested in the inverse problem defined by 
operator 

Af : P{x) S{e) = - / P{x)dF{x, 9). (12) 

^0 

As Lemma n shows, operator Ap is continuous in the uniform metric {L°°). 
We are interested in knowing whether its inverse is continuous, that is, if small 
deviations in prices can lead to large deviations in pricing kernel. We also 
need to know if the pricing kernel can be consistently estimated from noisy and 
discrete data. These problems are handled in the next two sections. 



3 Pricing problem is well-posed. 

A problem Ax = y is called well-posed if the operator A has a continuous 
inverse. It is implicit in this definition that the operator is given with its domain, 
and that topologies in both the range and the domain arc specified: the same 
operator may be ill-posed on one domain and well-posed on another one. The 
concept of well-posedness originated in mathematical physics by Hadamard as 
a tool to select the linear problems that could arise from a physical problem. 
Later, however, it was discovered that many important problems are ill-posed 
and the methods of their solution were derived ( Tikhonov and Arsenin (1977)[ 
'Sullivan (T986]]|Engl (2000)1). 



5 



If no restrictions on pricing kernels were imposed, then the operator in 
would correspond to an ill-posed problem. Indeed, it is easy to see ill-posedness 
from the following example: 

/■^ a 

a cos (/3a;) ^ / acos{Px)dx = —?,m.{(3K). (13) 
Jo P 

Consider the uniform convergence metric on both the domain and the range of 
the operator. If we set (3 = a/e, then the norm of the function on the left-hand 
side is constant: ||acos(/3a;)||^ = a, but its image can be made arbitrarily close 
to zero ||f (acos(/3a:))||j^ = ||esin(/3ii')||^ < e : therefore the inversion operator 
acts discontinuously. 

However, for the restricted domain of cumulative pricing kernels we have the 
following theorem: 

Theorem 1 If Ap is injective then it defines a well-posed problem on the space 
of all non- decreasing continuous functions with uniform convergence topology. 

Proof: In uniform convergence topology the space of non-decreasing con- 
tinuous functions is complete, li Ap is injective, then it defines a continuous 
one-to-one correspondence between this space and its image. The conclusion 

of the theorem follows because of one of the Banach theorems (see for example 

"It 



Theorem 11 in Chapter 15 of 'Lax (2002) I: A linear operator that establishes 



a continuous one-to-one correspondence between two complete normed linear 
spaces has a continuous inverse. QED. 

Corollary 1 Operator A from defines a well-posed problem on the space 

of all non- decreasing continuous functions with uniform convergence topology. 

Proof: Since A is injective, Theoremncan be applied. QED. 

In practice securities prices are known up to an error. This error includes 
bid-ask spread, non-stationarity in the pricing kernel, market inefficiencies and 
so on. We will use Theorem ^ and Corollary ^ as tools to prove that as the 
amount of data grows the constrained least squares estimates the pricing kernel 
consistently. 



4 Estimation by least squares is consistent. 

Let {57, E, Pr} be a probability space and Ei be a sequence of independent iden- 
tically distributed random variables with zero expectation and finite variance. 
Let also Xi be a sequence of points located between and B, which has a posi- 
tive density on [0,5]. We will say that the constrained least squares estimates 
function / from set J- consistently in norm ||-|| relative to operator A if for any 
6, with probability 1 there exists such A^o that for N > No, there exists 

N 2 

fN = argniax^ (^Af{x,) - Af{x,)j , (14) 
i=i 
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and 11/ — JnW < S. In other words, with probabihty 1, the sequence of esthnates 
/at converges to the true function. 

Here we are interested in the set, V, of non-decreasing, continuous, bounded 
functions on interval [0, -B]. We use the uniform convergence topology and op- 
erator A from 

Theorem 2 The constrained least squares estimates any function in V consis- 
tently in L°° relative to operator A. 

Proof: Let TZ be the image of V under operator A. Then TZ is the set 
of convex, increasing, continuous, bounded functions. Because of Theorem ^ 
operator A has a continuous inverse from TZ to V. Consequently, consistency of 
estimating functions from V relative to operator A is equivalent to consistency 
of estimating unmodified functions from TZ. For TZ, we can apply classic results 
for the non-parametric estimation of convex function. In particular, according 



to the main Theorem in Hanson and Pledger (1976) convex functions can be 



estimated consistently in norm by the constrained least squares method. 
QED. 

For a more general operator Ap from (|12|l we have a similar theorem, which, 
however, needs a more advanced technique and comes to a weaker conclusion. 
Let us call payoff function F{x, 6) uniformly Lipschitz in if 

\F{x,ei)-F{x,e2)\<c\e^-e2\, (i5) 

where C does not depend on x. Also let us call F(x, 9) uniformly bounded in 
variation, if its total variation over x G [0, B] is bounded by a constant that 
does not depend on 9. 

Theorem 3 // F{x, 9) is uniformly Lipschitz in 9 and uniformly hounded in 
variation, and Ap is injective, then the constrained least squares estimates any 
function in T' consistently in relative to operator Ap. 

In the proof we will again aim to prove that any function in 7^ = Ap{T') 
can be estimated consistently by the constrained least squares. We arc going 



to do it by referring to a theorem in van de Geer (1987) First, let us introduce 



several additional concepts. Let X be a set of functions on M'^ and let M„((5, X) 
be the minimal number of elements in a (5— covering of set X , if the distance is 
measured by the norm 

1 " 

ll/lln = -E[/(^^)]'- (16) 

1=1 

Then 5— entropy of a set is defined as 

N„{5,X)^^\ogMn{6,X). (17) 

Note that 5— entropy depends on the choice of points Xi. We assume that they 
are distributed randomly according to a measure, /i, that has a positive contin- 
uous density on [0, B]. Then let us call a set of functions entropically thin if for 
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any 5 



as n 



oo, 



(18) 



where convergence is in probability. Intuitively, a set of functions is entropically 
thin if all its functions can be well approximated by functions from a relatively 
"small" subset.^ 

Next, a class of functions, X, is called uniformly square integrable if 



lim sup / f^dx — 0. 



(19) 



A somewhat weaker version of van de Geer's result is sufficient for our purposes. 
It says that if a set X is uniformly square integrable and entropically thin, then 
the constrained least squares method is L^— consistent. 
Proof of Theorem ^ Any S* e 7^ is representable as 



S{0) 



P{x)dF{x,d). 



(20) 



Since set V is uniformly bounded and F(x, 6) is uniformly bounded in variation, 
set TZ is also uniformly bounded. Consequently, it is uniformly square integrable. 

Similarly, since F{x, 9) is uniformly Lipschitz in 9, and V is uniformly 
bounded, set TZ is uniformly Lipschitz: 



\s{e^) - s{92)\ = 



[F{x,9i) - F{x,92)]dP{x) 



< 



I Ci\e^-e2\dP{x) 

Jo 



< BC1C2 \9l — 02 



(21) 

(22) 
(23) 



Consequently, by Lemma 3.3.1 in van de Geer (1987) TZ is entropically thin. 
Therefore, van de Geer's theorem can be applied and the constrained least 
squares estimator is i^— consistent. 
QED. 

The conditions of Theorem O are not very restrictive. For example, the set 
of payoff functions for puts is uniformly Lipschitz: 



|max{i4:i -x,Q}~ max {iiTa - 2;,0}| < \Ki - i^al 



(24) 



It is also clearly uniformly bounded in variation over x e [0,-B], provided that 
we consider only a bounded set of the strikes: max{i^ — a;, 0} < A' = max AT. 
Finally, the pricing operator, A, is injective if A' > _B. Therefore, Theorem 13 is 
applicable. 

^The concept of entropy in relatio n to totally bounded sets of functions was introduced by 
[Kolmogorov and Tikliomirov (1959) It was applied to th e problem of consistency in non- 
parametric estimation by |Vapnik and Cervonenkis (1981)| For a textbook presentation, see 
[Pollard (1984)] 
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While asymptotically consistent, the constrained least sqiiares may perform 
poorly in small samples. It fails to take into account such possible prior beliefs 
as that the pricing kernel is smooth, or unimodal, or that it is approximately 
proportional to an infinitely divisible probability distribution, etc. In the next 
section we consider a modification of the method of constrained least squares 
that allows to take into account the prior information. 

5 Relaxed Maximum Relative Entropy Method 

In this section we will for simplicity restrict the discussion to the case when 
the pricing kernel is estimated from the put prices. Relaxed maximum entropy 
method penalizes both the degree to which the model fails in explaining the 
price data and the model's deviation from a prior model: 

where Si is the observed price of the put with strike Ki, 

S{K) = APix) = / P{x)dx, (26) 
Jo 

and Po{x) is a prior cumulative pricing kernel. 

Recall that the regular maximum entropy method is described by the fol- 
lowing minimization problem: 

Pme(x) = argmin i / In ^^^dP(x) s.t. S(Ki) = Si for each i I . (27) 
P{x) [Jo dPo{x) J 

If prices are contaminated with noise, then the regular maximum entropy may 
run into difficulties with the existence of the solution and is unlikely to be con- 
sistent. In the relaxed maximum entropy method, constraints are not rigid, they 
are substituted with a penalizing term in the objective function. Consequently, 
the solution is guaranteed to exist. What about consistency? 

Theorem 4 There is such a sequence of positive constants Xn, that the re- 
laxed maximum entropy method estimates the cumulative pricing kernel, P{x), 
consistently in L'^ norm. 

Proof: By a lemma below, there is such a sequence Ajv — ^ that as N ^ oo, 
the solution of the problem 

fl^, { ^ E (^(^^) - + A. £ m } (28) 
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with probability 1 approaches in I? norm the sohition of the constrained least 
squares problem: 

1 ^ 

min — y {S(Ki) - Sif s.t. dl-S > 0. (29) 

S(K) N ^ ft - \ / 

i—1 

by Theorem|21 as — * oo, the solution of the constrained least squares problem 
with probability 1 approaches the true pricing function S{K). 

By the standard diagonal process argument there is such a sequence of Aat, 
that the solution of 



min I V (SiK,) - S^f + \n ^ In ^^,dP{x) 
S(K) \ ^ ^ Jo dPo{x) 



(30) 



approaches in the true function S{K) a.s N ~> oo. Because the differentiation 
is a continuous operator on the set of convex non-decreasing functions, P{x) is 
also estimated consistently in L^. QED. 

In the proof of Theorem^ we have used the following Lemma. Consider the 
problem: 

min {Fjv(i?) + AArG(i?)}, 

where -F/v and G are continuous functionals of R{x). Let i?jv be solution of the 
problem, and Rn be the solution for A = 0. Let the sequence of functionals Fn 
be called proper on TZ if for any e we can find such 6 that for all sufficiently large 



N, and R E TZ, condition Fn{R) — Fn{Rn) < 5 implies that 



R-R 



N 



< e. 

L2 



Lemma 2 If {F^} is proper on TZ, then there exists such a sequence Xn that 
Rn ~Rn converges to zero. 

Proof: Take an e and select S and A^o as in the definition of properness; 

R — Rn > e it follows that 

L2 



then for any R E TZ and any N > No from 



Fn{R) — Fn{Rn) > 6. On the other hand, from continuity of Fn it follows that 



we can find such £i that 



R — Rj\ 



< El implies Fn{R) - Fn{Rn) < S/2. 



IN 

Also, since G is continuous, we can find such an R' inside the £i— neighborhood 
of Rn that \G{R') — G{Rn)\ < c. Consequently we can find such A that XG{R') < 
5/2. Then it is clear that the maximizer of Fn{R) + \G{R) cannot be outside 
of the e— neighborhood of Rn '■ R' would improve on it. 

Thus, for any e, there is A'o and A such that for N > Nq the solution of 
min{FAr + AG} is in e— neighborhood of Rn. QED. 

This Lemma can be used the proof of Theorem 0] because the functional 



^^iv(i?)^^^(i?(i^O-^.) (31) 
is proper. Indeed, note that this functional has a nice special property: 
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Lemma 3 If Fn{R) - Fn{Rn) < e then Fn{R - Rn) < e. 

Proof: Let R = Rn + 5R- Since Fn is a quadratic form, we can define a 
corresponding bilinear product: 

if, 9) = \ {FnU + 9)~ F^if) - F^ig)} , (32) 

We claim that 

{Rn,SR)>0. (33) 

Indeed, since the set of convex non-decreasing functions, TZ, is convex, Ra = 
Rn + aSR £ TZ for any a E [0, 1]. Consequently, if H33(l were violated we could 
find such a that {Ra,Ra) < (Rn, Rn), which would contradict optimality of 
Rn- Using H33(l . we can write: 

Fn{SR) + Fn{Rn) < Fn{R) < FnIRn) + s, (34) 

and FNiSR) < e. QED. 

So, to obtain properness of {Fn} it remains to prove that from Fn{R — 



Rn) < £ ioT all large N we can conclude that 



R — Rn 



< e. By assumption, 

points {xi} are distributed with density p{x) > A: > on [0, -B]. Then we can 
use the following lemma: 

Lemma 4 If f(x) is non-negative and has finite variation on [0, -B] then from 
1 ^ 

^ /(^O < e for any N > No, (35) 

it follows that 

f{x)dx < -. (36) 
Proof: The sum converges to f{x)p{x)dx. Therefore, 

B p, \ -I rB 



f{x)dx = / —-—p{x)dx < 7 / f{x)p{x)dx < -. (37) 
Jo P[x) k Jo k 

QED. 

Properness of functional Fn{x) follows from this lemma applied to f{x) = 
(^R{x) - RNix)y . 

6 Conclusion 

It is proved that the mapping from the set of non-decreasing cumulative pricing 
kernels to security prices corresponds to a well-posed inverse problem, and that 
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the constrained least squares method provides a consistent estimator of the 
cumulative pricing kernel. 

It is also suggested that in small samples the performance of the constrained 

least squares can be improved by a modification that takes into accoimt that 
the pricing kernel should be close to a certain prior kernel. It is proved that 
this method is consistent. 
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